Skip to content

ggml : add ggml_scale_bias #14417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 9, 2025
Merged

ggml : add ggml_scale_bias #14417

merged 19 commits into from
Jul 9, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Jun 27, 2025

Ref discussion: #14400 (comment)

Added ggml_scale_bias(ctx, a, s, b) in this PR, which allows calculating x = a * s + b

I only added Metal kernel for now, just for discussion. @ggerganov does this looks good to you?

TODO: support other backends

@ngxson
Copy link
Collaborator Author

ngxson commented Jun 27, 2025

I hope this won't have a significant impact on the performance

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jun 27, 2025
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it's a useful extension of the operator.

Comment on lines 3968 to 3971
ggml_vec_scale_f32(nc, (float *) ((char *) dst->data + i1*nb1), s);
if (b != 0.0f) {
ggml_vec_acc1_f32(nc, (float *) ((char *) dst->data + i1*nb1), b);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge these in ggml_vec_mad1_f32(). If you want, you can try to add a GGML_SIMD version using GGML_F32_VEC_FMA - it's quite simple. But also can leave it a basic for loop without SIMD.

@ehoogeveen-medweb
Copy link

Quick question: Is the "scale-bias" nomenclature more appropriate here than "multiply-add"? From an outsider perspective familiar with fused multiply-add ("MAD") operations, I didn't realize that "scale" meant "multiply" and "bias" meant "add" until I took a closer look.

@ngxson
Copy link
Collaborator Author

ngxson commented Jun 29, 2025

multiply-add can be confused because we already had ggml_mul and ggml_add which takes 2 tensors as input

ggml_scale on the other hand, takes a tensor and a scalar value as input

So ggml_scale_bias is the best fit because the naming doesn't clash with "multiply" or "add". Bias meaning we add a scalar value, not a tensor

@ngxson
Copy link
Collaborator Author

ngxson commented Jul 8, 2025

@ggerganov On second thought, I'm worry that extending the kernel for ggml_scale will have negative impact on performance.

I had a look into ggml_add1 and just realized that it's actually ggml_add with broadcasting under the hood (on CUDA, the same kernel is used for the 2 ops) - so I'm just wondering, should we add a new GGML_OP_ADD1 that support adding a scalar value?

My idea is that:

  • ggml_add1(ctx, a, b) will simply call ggml_add(ctx, a, b) under the hood
  • ggml_add1_scalar(ctx, a, val) is added which supports scalar value

Edit: see comment below

@jeffbolznv
Copy link
Collaborator

What is the concern with performance? Adding a value from a constant is about as cheap as it gets.

@ngxson
Copy link
Collaborator Author

ngxson commented Jul 8, 2025

Hmm ok maybe I'm just too concern about the fact that cheap ops if being repeatedly called can still make an impact.

A quick search in llama.cpp reveal that ggml_scale at most once or twice per layer. So I think the impact won't be as significant as I thought. Unless a model has hundred thousands of layer.

So I'll go back with the initial proposal of ggml_scale_bias for now

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Jul 8, 2025
Comment on lines 356 to 357
vDSP_vsmul(y, 1, &s, y, 1, n);
vDSP_vsadd(y, 1, &b, y, 1, n);
Copy link
Member

@ggerganov ggerganov Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is vDSP_vmsa

There is vDSP_vsmsa

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented in 563aca0

Comment on lines 359 to 388
#if defined(__ARM_FEATURE_SVE)
const int sve_register_length = ggml_cpu_get_sve_cnt() * 8;
const int ggml_f32_epr = sve_register_length / 32;//8;//svcntw(); // SVE128:4, SVE256:8, SVE512:16
const int ggml_f32_step = 2 * ggml_f32_epr;

GGML_F32_VEC vs = GGML_F32_VEC_SET1(s);
GGML_F32_VEC vb = GGML_F32_VEC_SET1(b);

const int np = (n & ~(ggml_f32_step - 1));
svfloat32_t ay1;
svfloat32_t ay2;
for (int i = 0; i < np; i += ggml_f32_step) {
ay1 = GGML_F32_VEC_LOAD(y + i);
ay1 = GGML_F32_VEC_FMA(ay1, vs, vb);
GGML_F32_VEC_STORE(y + i, ay1);

ay2 = GGML_F32_VEC_LOAD(y + i + 1*ggml_f32_epr);
ay2 = GGML_F32_VEC_FMA(ay2, vs, vb);
GGML_F32_VEC_STORE(y + i + 1*ggml_f32_epr, ay2);
}
// leftovers
// maximum number of leftover elements will be less that ggml_f32_epr. Apply predicated svmad on available elements only
if (np < n) {
svbool_t pg = svwhilelt_b32(np, n);
ay1 = svld1_f32(pg, y + np);
ay1 = svmul_f32_m(pg, ay1, vs);
ay1 = svadd_f32_m(pg, ay1, vb);
svst1_f32(pg, y + np, ay1);
}
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this SVE implementation - we don't have hardware to test it yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 50c678f

@ngxson ngxson marked this pull request as ready for review July 9, 2025 09:57
@ngxson
Copy link
Collaborator Author

ngxson commented Jul 9, 2025

The only backend that is not currently supported is CANN, could you tag the contributors from CANN? @ggerganov

Also it would be nice if we can launch the full CI for testing CUDA and sycl, but I'm not sure how to do this (and I'm not sure if it's possible if the PR is created from a forked repo)

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this through ggml-ci would be nice. You can just push a tmp branch and check its results - no need to recreate the PR.

memcpy(&scale, dst->op_params, sizeof(float));
memcpy(&bias, (float *) dst->op_params + 1, sizeof(float));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the this more consistent:

memcpy(&scale, (float *) dst->op_params + 0, sizeof(float));
memcpy(&bias,  (float *) dst->op_params + 1, sizeof(float));

Comment on lines 358 to 363
#if defined(__ARM_FEATURE_SVE)
// scalar ; TODO: Write SVE code
for (int i = 0; i < n; ++i) {
y[i] = y[i]*s + b;
}
#else
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GGML_F32_STEP doesn't seem to be defined on ARM SVE, so I leave the scalar impl here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's ok for now. I'm having some doubts about these SVE branches - might end up removing them all together.

@ngxson
Copy link
Collaborator Author

ngxson commented Jul 9, 2025

I ran the ggml-ci and it passed: b7c6ece

Will merge this once the CI or this PR is green

@ngxson ngxson merged commit 98bab63 into ggml-org:master Jul 9, 2025
49 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 9, 2025
* origin/master:
llama : support Jamba hybrid Transformer-Mamba models (ggml-org#7531)
ggml : add ggml_scale_bias (ggml-org#14417)
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 10, 2025
* ggml : add ggml_scale_bias

* ggml_vec_mad1_f32

* add more simd

* add CUDA

* sycl

* vulkan

* cann (placeholder)

* opencl

* will this fix cpu?

* fix cuda

* suggestions from coderabbit

* fix cann compile error

* vDSP_vsmsa

* rm __ARM_FEATURE_SVE

* use memcpy for op params

* make code looks more consistent

* use scalar for __ARM_FEATURE_SVE

* add x param to ggml_vec_mad1_f32
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 10, 2025
* ggml : add ggml_scale_bias

* ggml_vec_mad1_f32

* add more simd

* add CUDA

* sycl

* vulkan

* cann (placeholder)

* opencl

* will this fix cpu?

* fix cuda

* suggestions from coderabbit

* fix cann compile error

* vDSP_vsmsa

* rm __ARM_FEATURE_SVE

* use memcpy for op params

* make code looks more consistent

* use scalar for __ARM_FEATURE_SVE

* add x param to ggml_vec_mad1_f32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants